A Two-Phase Spectral Bigraph Co-clustering Approach for the “Who Rated What” Task in KDD Cup 2007

نویسندگان

Ting Liu

Yonghong Tian

Wen Gao

چکیده

This paper describes our approach for the “Who Rated What” task in KDD Cup 2007 competition. Given the Netflix data set that consists of more than 100 million ratings between 1998 and 2005, this task is to predict the probability that each user-movie pair was rated in 2006. Totally 100,000 user-movie pairs are drawn from the Netflix data set as the test set. In our approach, the Netflix data set is modeled as a bipartite graph (or bigraph) with users and movies on either side. In the bigraph, there are only directed edges from user nodes to movie nodes and each directed edge corresponds to a rating event that the user rated the movie at some time. Then the given task can be further formulated as a link existence prediction problem, i.e., whether a directed link exists between a user node and a movie node. Considering the huge size and the sparsity of ratings in the data set, it is important to reveal the hidden class-based correlation between users and movies from the bigraph while keeping relatively low computational complexity. Towards this end, a two-phase spectral bigraph co-clustering approach is used in our approach. The key idea is to simultaneously obtain user and movie neighborhoods via co-clustering and then generate predictions based on the results of co-clustering. Roughly speaking, our approach includes three steps. First, users and movies are coarsely clustered using K-means algorithm respectively. Then the user and movie clusters are further coclustered using multipartite spectral graph partition algorithm. Based on the results of co-clustering, a probabilistic model is derived to predict the probability of a link existing between a user node and a movie node. Experimental results show that our approach works well in the task.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Who Rated What: a combination of SVD, correlation and frequent sequence mining

KDD Cup 2007 focuses on predicting aspects of movie rating behavior. We present our prediction method for Task 1 “Who Rated What in 2006” where the task is to predict which users rated which movies in 2006. We use the combination of the following predictors, listed in the order of their efficiency in the prediction: • The predicted number of ratings for each movie based on time series predictio...

متن کامل

Implementation of Fuzzy c-Means and Outlier Detection for Intrusion Detection with KDD Cup 1999 Data Set

In this paper, a two-phase method for computer network intrusion detection is proposed. In the first phase, a set of patterns (data) are clustered by the fuzzy c-means algorithm. In the second phase, outliers are constructed by a distance-based technique and a class label is assigned to each pattern. The KDD Cup 1999 data set is used for the experiment. The results show that, for binary classif...

متن کامل

Intrusion Detection based on a Novel Hybrid Learning Approach

Information security and Intrusion Detection System (IDS) plays a critical role in the Internet. IDS is an essential tool for detecting different kinds of attacks in a network and maintaining data integrity, confidentiality and system availability against possible threats. In this paper, a hybrid approach towards achieving high performance is proposed. In fact, the important goal of this paper ...

متن کامل

Taxonomy-Informed Latent Factor Models for Implicit Feedback

We describe an approach based on latent factor models to the Track 2 task of KDD Cup 2011, which required learning to discriminate between highly rated and unrated items from a large dataset of music ratings. We take the pairwise ranking route, training our models to rank the highly rated items above the unrated items which are sampled from the same distribution. Using the item relationship inf...

متن کامل

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

A Two-Phase Spectral Bigraph Co-clustering Approach for the “Who Rated What” Task in KDD Cup 2007

نویسندگان

چکیده

منابع مشابه

Who Rated What: a combination of SVD, correlation and frequent sequence mining

Implementation of Fuzzy c-Means and Outlier Detection for Intrusion Detection with KDD Cup 1999 Data Set

Intrusion Detection based on a Novel Hybrid Learning Approach

Taxonomy-Informed Latent Factor Models for Implicit Feedback

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

عنوان ژورنال:

اشتراک گذاری